# Import required R libraries
library(fpp3)

Exercise 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Section a

Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer: The primary difference among the figures is the value of the bounds. From left to right the bounds get closer to zero. Yes, by the definition below, each figure indicates white noise.

Textbook definition of white noise from section 2.9:

“For a white noise series, we expect 95% of the spikes in the ACF to lie within \(\pm/\sqrt{T}\) where \(T\) is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above). If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise.”

Section b

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer: The critical values are different distances from the mean of zero because, as noted above, the calculation is based on the length of the time series. The longer the time series the closer the critical values are to the mean. This implies the time series lengths increase from left to right as noted in the question itself. Each time series is based on random numbers, thus the autocorrelations will be unique to each figure. White noise indicates a mean near zero and a constant variance. As the question indicates random numbers, then differencing those random numbers should result in a mean near zero and constant variance.

Exercise 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol == 'AMZN') %>%
  autoplot(Close) +
  labs(title = "Amazon Daily Closing Stock Price",
       y = "Price")

Answer: Data is non-stationary as the plot shows a clear increasing trend.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  ACF(Close) %>%
  autoplot()

Answer: The ACF plot decreases to zero very slowly, and as the Hyndman textbooks indicates “for a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly.” Thus, the ACF plot shows non-stationary of data.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  PACF(Close) %>%
  autoplot()

Answer: The PACF plot shows correlation which of course matches the first lag of the ACF, but then lag 5, 11, 19, 25, 29, indicate correlation with \(y_t\) even after removing the intervening correlations. Thus, this indicates the data display non-stationarity.

Exercise 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Section a

Turkish GDP from global_economy.

turkey_gdp <- global_economy %>%
  filter(Country == 'Turkey')

lambda <- turkey_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

turkey_gdp %>%
  autoplot(box_cox(GDP, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed Turkish GDP with $\\lambda$ = ",
         round(lambda,2))))

Appears to have an increasing trend after Box-Cox transformation. Attempt first differencing.

turkey_gdp <- turkey_gdp %>%
  mutate(diff_bc_gdp = difference(box_cox(GDP, lambda)))

# Display plot of first difference
turkey_gdp %>%
  autoplot(diff_bc_gdp) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "First Difference of Transformed Turkish GDP with $\\lambda$ = ",
         round(lambda,2))))

Plot looks good, let’s apply Unit Root Test.

# Apply Unit Root Test
turkey_gdp %>%
  features(diff_bc_gdp, unitroot_kpss)
## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey     0.0889         0.1

Unit Root Test results in a very small test statistic and within the range expected for stationary data, so the p-value is greater than 0.1, thus concluding that the differenced data appear stationary.

Section b

Accommodation takings in the state of Tasmania from aus_accommodation.

Note: Quarterly data

Calculate lambda for the Box-Cox transformation.

tas_takings <- aus_accommodation %>%
  filter(State == 'Tasmania')

lambda <- tas_takings %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

Plot the original data, the data with Box-Cox, the first difference of the Box-Cox transformation, and then a second difference, as well.

tas_takings %>%
  transmute(
    `Takings` = Takings,
    `Box-Cox Takings` = box_cox(Takings, lambda),
    `Annual change in Box-Cox Takings` = difference(box_cox(Takings, lambda), 4),
    `Doubly differenced Takings` =
                     difference(difference(box_cox(Takings, lambda), 4), 1)
  ) %>%
  pivot_longer(-Date, names_to="Type", values_to="Takings") %>%
  mutate(
    Type = factor(Type, levels = c(
      "Takings",
      "Box-Cox Takings",
      "Annual change in Box-Cox Takings",
      "Doubly differenced Takings"))
  ) %>%
  ggplot(aes(x = Date, y = Takings)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Tasmanian Accomodation Takings", y = NULL)

Definitely appears to have a seasonal component.

Appears the doubly differencing is necessary.

tas_takings %>%
  mutate(diff_bc_takings = difference(box_cox(Takings, lambda), 4)) %>%
  features(diff_bc_takings, unitroot_kpss)
## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania     0.210         0.1
tas_takings %>%
  mutate(diff_bc_takings = difference(difference(box_cox(Takings, lambda), 4), 1)) %>%
  features(diff_bc_takings, unitroot_kpss)
## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania    0.0475         0.1

Based on the above output from the Unit Root Test, a second differencing of the Box-Cox transformation of the data is required to attain stationarity, which falls in line with the plots above.

tas_takings %>%
  mutate(box_cox_takings = box_cox(Takings, lambda)) %>%
  features(box_cox_takings, unitroot_nsdiffs)
## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1
tas_takings %>%
  mutate(box_cox_takings2 = difference(box_cox(Takings, lambda), 4)) %>%
  features(box_cox_takings2, unitroot_ndiffs)
## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

I also attempted the unitroot_nsdiffs on the Box-Cox transformed data and unitroot_ndiffs on the differenced Box-Cox transformed data. The first test indicated a seasonal differencing of one required, but the test on the differenced transformed data indicated differencing was not needed. That being said, I’m going with the second differencing.

Section c

Monthly sales from souvenirs.

lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>%
  autoplot(box_cox(Sales, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

souvenirs %>%
  transmute(
    `Sales` = Sales,
    `Box-Cox sales` = box_cox(Sales, lambda),
    `Annual change in Box-Cox sales` = difference(box_cox(Sales, lambda), 12),
    `Doubly differenced Box-Cox sales` =
                     difference(difference(box_cox(Sales, lambda), 12), 1)
  ) %>%
  pivot_longer(-Month, names_to="Type", values_to="Sales") %>%
  mutate(
    Type = factor(Type, levels = c(
      "Sales",
      "Box-Cox sales",
      "Annual change in Box-Cox sales",
      "Doubly differenced Box-Cox sales"))
  ) %>%
  ggplot(aes(x = Month, y = Sales)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Souvenirs Sales", y = NULL)

Definitely appears to have a seasonal component.

Appears the double differencing is needed.

souvenirs %>%
  mutate(diff_bc_sales = difference(box_cox(Sales, lambda), 12)) %>%
  features(diff_bc_sales, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.240         0.1
souvenirs %>%
  mutate(diff_bc_sales = difference(difference(box_cox(Sales, lambda), 12), 1)) %>%
  features(diff_bc_sales, unitroot_kpss)
## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0381         0.1

Similar to section b, based on the above output from the Unit Root Test, a second differencing of the Box-Cox transformation of the data is required to attain stationarity, which falls in line with the plots above.

souvenirs %>%
  mutate(box_cox_sales = box_cox(Sales, lambda)) %>%
  features(box_cox_sales, unitroot_nsdiffs)
## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1
# Now try differencing on the seasonal difference
souvenirs %>%
  mutate(box_cox_sales = difference(box_cox(Sales, lambda), 12)) %>%
  features(box_cox_sales, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

At least this time, the second differencing is needed to reach stationarity based on the output of the unitroot_nsdiffs and unitroot_ndiffs tests. But after knitting the RMD file, the ndiffs is resulting in zero, despite the execution of the above snippet resulting in 1 in RStudio. I’m still going with second differencing required.

Exercise 9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(8675309)

# Monthly data
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot(Turnover)

head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State      Industry               `Series ID`    Month Turnover
##   <chr>      <chr>                  <chr>          <mth>    <dbl>
## 1 Queensland Takeaway food services A3349718A   1982 Apr     26.7
## 2 Queensland Takeaway food services A3349718A   1982 May     27.3
## 3 Queensland Takeaway food services A3349718A   1982 Jun     26.2
## 4 Queensland Takeaway food services A3349718A   1982 Jul     25.2
## 5 Queensland Takeaway food services A3349718A   1982 Aug     25.6
## 6 Queensland Takeaway food services A3349718A   1982 Sep     26.7
# Box Cox
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed retail tunrover with $\\lambda$ = ",
         round(lambda,2))))

Plotting the data in the same manner as Exercise 9.3 for visual inspection.

myseries %>%
  transmute(
    `Turnover` = Turnover,
    `Box-Cox turnover` = box_cox(Turnover, lambda),
    `Annual change in Box-Cox turnover` = difference(box_cox(Turnover, lambda), 12),
    `Doubly differenced Box-Cox turnover` =
                     difference(difference(box_cox(Turnover, lambda), 12), 1)
  ) %>%
  pivot_longer(-Month, names_to="Type", values_to="Turnover") %>%
  mutate(
    Type = factor(Type, levels = c(
      "Turnover",
      "Box-Cox turnover",
      "Annual change in Box-Cox turnover",
      "Doubly differenced Box-Cox turnover"))
  ) %>%
  ggplot(aes(x = Month, y = Turnover)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title = "Retail Turnover", y = NULL)

# nsdiff
myseries %>%
  mutate(box_cox_turnover = box_cox(Turnover, lambda)) %>%
  features(box_cox_turnover, unitroot_nsdiffs)
## # A tibble: 1 × 3
##   State      Industry               nsdiffs
##   <chr>      <chr>                    <int>
## 1 Queensland Takeaway food services       1
# ndiff
# Now try differencing on the seasonal difference
myseries %>%
  mutate(box_cox_turnover = difference(box_cox(Turnover, lambda), 12)) %>%
  features(box_cox_turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
##   State      Industry               ndiffs
##   <chr>      <chr>                   <int>
## 1 Queensland Takeaway food services      1

Based on the above output from unitroot_nsdiffs and unitroot_ndiffs, second differencing is needed after applying the Box-Cox transformation.

Exercise 9.6

Simulate and plot some data from simple ARIMA models.

Section a

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Section b

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

sim %>% autoplot(y)

# phi=0.2
for(i in 2:100)
  y[i] <- 0.2*y[i-1] + e[i]
sim02 <- tsibble(idx = seq_len(100), y = y, index = idx)

# phi=1.0
for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
sim10 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim02 %>% autoplot(y)

sim10 %>% autoplot(y)

Lower the \(\phi_1\), the more oscillations occur around zero, the higher the \(\phi_1\) value, the fewer oscillations and thus the plot doesn’t center around zero.

Section c

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

\[MA(1)\ is\ y_t = c + \epsilon_t + \theta_1\epsilon_{t-1}\]

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6*e[i-1]

sim_ma1 <- tsibble(idx = seq_len(100), y = y, index = idx)

#head(sim_ma1)

Section d

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

sim_ma1 %>% autoplot(y)

# theta is 0.2
for(i in 2:100)
  y[i] <- e[i] + 0.2*e[i-1]
sim_ma02 <- tsibble(idx = seq_len(100), y = y, index = idx)

# theta is 1.0
for(i in 2:100)
  y[i] <- e[i] + 1.0*e[i-1]
sim_ma10 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_ma02 %>% autoplot(y)

sim_ma10 %>% autoplot(y)

The lower the value of \(\theta_1\), the close the plot stays around zero, but with the higher value of \(\theta_1\), the plot still remains around zero, but absolute values of y tend to be larger.

Section e

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6,\) \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

y <- numeric(100)
e <- rnorm(100)

phi <- 0.6
theta <- 0.6

for(i in 2:100)
  y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]

sim_arma11 <- tsibble(idx = seq_len(100), y = y, index = idx)

Section f

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

\[AR(2)\ is \ y_t = c + \phi_1y_{t-1} + \phi_2y_{t-2} + \epsilon_t\]

y <- numeric(100)
e <- rnorm(100)

for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]

sim_ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Section g

Graph the latter two series and compare them.

sim_arma11 %>% autoplot(y)

sim_ar2 %>% autoplot(y)

The plot for ARMA(1,1) model appears close to stationary data. The variance does appear to increase as the plot moves from left to right.

As for the AR(2) model, the plot is certainly not stationary. The variance clearly grows as the plot moves from left to right.

Exercise 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

library(urca)
aus_airpassengers %>% head()
## # A tsibble: 6 x 2 [1Y]
##    Year Passengers
##   <dbl>      <dbl>
## 1  1970       7.32
## 2  1971       7.33
## 3  1972       7.80
## 4  1973       9.38
## 5  1974      10.7 
## 6  1975      11.1
# Year      Passengers

aus_airpassengers %>% autoplot()

# Box Cox
lambda <- aus_airpassengers %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

aus_airpassengers %>%
  autoplot(box_cox(Passengers, lambda))

Section a

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

# Generate ARIMA model
aus_air_mod <- aus_airpassengers %>% model(ARIMA(Passengers, stepwise = F))

Display model definition

# Output report to show model selected
report(aus_air_mod)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

Result is an ARIMA(0,2,1) model.

Check the residuals look like white noise

aus_air_mod %>% gg_tsresiduals()

Yes, the residuals appear to be white noise.

Plot forecasts for the next 10 periods

aus_air_fc <- aus_air_mod %>% forecast(h=10)

aus_air_fc
## # A fable: 10 x 4 [1Y]
## # Key:     .model [1]
##    .model                           Year Passengers .mean
##    <chr>                           <dbl>     <dist> <dbl>
##  1 ARIMA(Passengers, stepwise = F)  2017 N(75, 4.3)  74.8
##  2 ARIMA(Passengers, stepwise = F)  2018 N(77, 9.6)  77.0
##  3 ARIMA(Passengers, stepwise = F)  2019  N(79, 16)  79.2
##  4 ARIMA(Passengers, stepwise = F)  2020  N(81, 23)  81.3
##  5 ARIMA(Passengers, stepwise = F)  2021  N(84, 32)  83.5
##  6 ARIMA(Passengers, stepwise = F)  2022  N(86, 42)  85.7
##  7 ARIMA(Passengers, stepwise = F)  2023  N(88, 53)  87.9
##  8 ARIMA(Passengers, stepwise = F)  2024  N(90, 66)  90.1
##  9 ARIMA(Passengers, stepwise = F)  2025  N(92, 80)  92.3
## 10 ARIMA(Passengers, stepwise = F)  2026  N(94, 97)  94.5
autoplot(aus_air_fc, aus_airpassengers) +
  labs(title="Number of passengers (in millions) from Australian air carriers", y="(in millions)" )

Section b

Write the model in terms of the backshift operator.

Model: ARIMA(0,2,1)

\[(1-B)^2y_t = (1+\theta_1B)\epsilon_t\]

Section c

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

# Apparently drift just gets applied
aus_air_arima010 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

report(aus_air_arima010)
## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97
aus_air_fc_010 <- aus_air_arima010 %>% forecast(h=10)

aus_air_fc_010
## # A fable: 10 x 4 [1Y]
## # Key:     .model [1]
##    .model                            Year Passengers .mean
##    <chr>                            <dbl>     <dist> <dbl>
##  1 ARIMA(Passengers ~ pdq(0, 1, 0))  2017 N(74, 4.3)  74.0
##  2 ARIMA(Passengers ~ pdq(0, 1, 0))  2018 N(75, 8.5)  75.4
##  3 ARIMA(Passengers ~ pdq(0, 1, 0))  2019  N(77, 13)  76.9
##  4 ARIMA(Passengers ~ pdq(0, 1, 0))  2020  N(78, 17)  78.3
##  5 ARIMA(Passengers ~ pdq(0, 1, 0))  2021  N(80, 21)  79.7
##  6 ARIMA(Passengers ~ pdq(0, 1, 0))  2022  N(81, 26)  81.1
##  7 ARIMA(Passengers ~ pdq(0, 1, 0))  2023  N(83, 30)  82.5
##  8 ARIMA(Passengers ~ pdq(0, 1, 0))  2024  N(84, 34)  84.0
##  9 ARIMA(Passengers ~ pdq(0, 1, 0))  2025  N(85, 38)  85.4
## 10 ARIMA(Passengers ~ pdq(0, 1, 0))  2026  N(87, 43)  86.8
autoplot(aus_air_fc_010, aus_airpassengers) +
  labs(title="Number of passengers (in millions) from Australian air carriers", y="(in millions)" )

Section d

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

aus_air_arima212 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))
report(aus_air_arima212)
## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43
aus_air_fc_212 <- aus_air_arima212 %>% forecast(h=10)

aus_air_fc_212
## # A fable: 10 x 4 [1Y]
## # Key:     .model [1]
##    .model                                Year Passengers .mean
##    <chr>                                <dbl>     <dist> <dbl>
##  1 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2017 N(73, 4.2)  73.2
##  2 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2018 N(76, 8.6)  75.6
##  3 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2019  N(77, 15)  77.1
##  4 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2020  N(78, 20)  77.7
##  5 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2021  N(80, 25)  79.5
##  6 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2022  N(81, 30)  81.3
##  7 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2023  N(82, 36)  82.2
##  8 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2024  N(84, 40)  83.6
##  9 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2025  N(85, 46)  85.4
## 10 ARIMA(Passengers ~ 1 + pdq(2, 1, 2))  2026  N(87, 51)  86.6
autoplot(aus_air_fc_212, aus_airpassengers) +
  labs(title="Number of passengers (in millions) from Australian air carriers", y="(in millions)" )

aus_air_arima212_noCon <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
report(aus_air_arima212_noCon)
## Series: Passengers 
## Model: NULL model 
## NULL model
aus_air_fc_212_noCon <- aus_air_arima212_noCon %>% forecast(h=10)

aus_air_fc_212_noCon
## # A fable: 10 x 4 [1Y]
## # Key:     .model [1]
##    .model                                Year Passengers .mean
##    <chr>                                <dbl>     <dist> <dbl>
##  1 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2017         NA    NA
##  2 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2018         NA    NA
##  3 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2019         NA    NA
##  4 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2020         NA    NA
##  5 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2021         NA    NA
##  6 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2022         NA    NA
##  7 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2023         NA    NA
##  8 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2024         NA    NA
##  9 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2025         NA    NA
## 10 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))  2026         NA    NA
autoplot(aus_air_fc_212_noCon, aus_airpassengers) +
  labs(title="Number of passengers (in millions) from Australian air carriers", y="(in millions)" )

# 212 produces NULL model
# 211 works though
# 112 works, too

Section e

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

aus_air_arima021 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

report(aus_air_arima021)
## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63
aus_air_fc_021 <- aus_air_arima021 %>% forecast(h=10)

aus_air_fc_021
## # A fable: 10 x 4 [1Y]
## # Key:     .model [1]
##    .model                                Year Passengers .mean
##    <chr>                                <dbl>     <dist> <dbl>
##  1 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2017 N(75, 3.9)  75.4
##  2 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2018   N(78, 8)  78.2
##  3 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2019  N(81, 12)  81.1
##  4 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2020  N(84, 17)  84.0
##  5 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2021  N(87, 21)  87.0
##  6 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2022  N(90, 26)  90.0
##  7 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2023  N(93, 31)  93.1
##  8 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2024  N(96, 36)  96.3
##  9 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2025  N(99, 41)  99.5
## 10 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))  2026 N(103, 47) 103.
autoplot(aus_air_fc_021, aus_airpassengers) +
  labs(title="Number of passengers (in millions) from Australian air carriers", y="(in millions)" )

Comparison

Exercise 9.8

For the United States GDP series (from global_economy):

us_gdp <- global_economy %>%
  filter(Country == "United States") %>%
  mutate(GDP = GDP/1e9) # GDP in billions
  
head(us_gdp)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country       Code   Year   GDP Growth   CPI Imports Exports Population
##   <fct>         <fct> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 United States USA    1960  543.  NA     13.6    4.20    4.97  180671000
## 2 United States USA    1961  563.   2.30  13.7    4.03    4.90  183691000
## 3 United States USA    1962  605.   6.10  13.9    4.13    4.81  186538000
## 4 United States USA    1963  639.   4.40  14.0    4.09    4.87  189242000
## 5 United States USA    1964  686.   5.80  14.2    4.10    5.10  191889000
## 6 United States USA    1965  744.   6.40  14.4    4.24    4.99  194303000
us_gdp %>% autoplot(GDP)

Section a

if necessary, find a suitable Box-Cox transformation for the data;

# Box Cox
lambda <- us_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

us_gdp %>%
  autoplot(box_cox(GDP, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed US GDP with $\\lambda$ = ",
         round(lambda,2))))

us_gdp <- us_gdp %>%
  mutate(GDP_T = box_cox(GDP, lambda))

us_gdp <- us_gdp %>%
  mutate(Diff = difference(GDP_T))

head(us_gdp)
## # A tsibble: 6 x 11 [1Y]
## # Key:       Country [1]
##   Country Code   Year   GDP Growth   CPI Imports Exports Population GDP_T   Diff
##   <fct>   <fct> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl> <dbl>  <dbl>
## 1 United… USA    1960  543.  NA     13.6    4.20    4.97  180671000  17.4 NA    
## 2 United… USA    1961  563.   2.30  13.7    4.03    4.90  183691000  17.6  0.215
## 3 United… USA    1962  605.   6.10  13.9    4.13    4.81  186538000  18.0  0.431
## 4 United… USA    1963  639.   4.40  14.0    4.09    4.87  189242000  18.4  0.330
## 5 United… USA    1964  686.   5.80  14.2    4.10    5.10  191889000  18.8  0.445
## 6 United… USA    1965  744.   6.40  14.4    4.24    4.99  194303000  19.3  0.517
us_gdp %>%
  ACF(Diff) %>%
  autoplot()

us_gdp %>%
  PACF(Diff) %>%
  autoplot()

Results in \(\lambda = 0.28\).

Section b

fit a suitable ARIMA model to the transformed data using ARIMA();

us_gdp_mod <- us_gdp %>% model(ARIMA(GDP_T))

report(us_gdp_mod)
## Series: GDP_T 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586    0.3428
## s.e.  0.1198    0.0276
## 
## sigma^2 estimated as 0.0461:  log likelihood=7.72
## AIC=-9.43   AICc=-8.98   BIC=-3.3

Section c

try some other plausible models by experimenting with the orders chosen;

us_gdp_mod111 <- us_gdp %>%
  model(ARIMA(GDP_T ~ pdq(1,1,1)))

us_gdp_mod211 <- us_gdp %>%
  model(ARIMA(GDP_T ~ pdq(2,1,1)))

us_gdp_mod112 <- us_gdp %>%
  model(ARIMA(GDP_T ~ pdq(1,1,2)))

us_gdp_mod101 <- us_gdp %>%
  model(ARIMA(GDP_T ~ pdq(1,0,1)))

Section d

choose what you think is the best model and check the residual diagnostics;

Section e

produce forecasts of your fitted model. Do the forecasts look reasonable?

Section f

compare the results with what you would obtain using ETS() (with no transformation).